Newest 'python reinforcement-learning' Questions

0votes

0answers

20views

Python - adding more timesteps makes my model "fail"

Hi! I have just made my first model in stable-baselines3 using pygame in Python. The game is about a ball reaching the highest platform out of three placed in the sky. Now - after a few days of trying ...

Skorejen

101

asked Feb 13 at 21:56

0votes

1answer

164views

Reward not improving for a custom environment using PPO

I've been trying to train an agent on a custom environment I implemented with gym where the goal is to resolve voltage violations in a power grid by adjusting the active power (loads) at each node. I ...

W8_4_it

1

asked Dec 9, 2024 at 17:37

1vote

1answer

87views

Deep RL problem: Loss decreases but agent doesn't learn

I'm implementing a basic Vanilla Policy Gradient algorithm for the CartPole-v1 gymnasium environment, and I don't know what I'm doing wrong. No matter what I try, during the training loop the loss ...

wildBass

11

asked Nov 7, 2024 at 11:05

1vote

0answers

22views

Optimizing Wind Park Layout Using Direct Action-to-Input Mapping

I’m optimizing a black-box objective function where the task is to find the optimal turbine locations in a wind park. Previously, I used a PPO reinforcement learning approach with a step-by-step ...

Shahriar

23

asked Oct 20, 2024 at 17:07

1vote

1answer

156views

How Do I Optimise a Black-Box Objective Function with DQN Using Reinforcement Learning?

I'm a beginner in the field of reinforcement learning, and I'm currently working on a problem that has me a bit stuck. I'm trying to optimize a black-box objective function using reinforcement ...

Shahriar

23

asked Sep 6, 2024 at 14:48

0votes

1answer

289views

Why is PPO not choosing a solution that is giving a higher cumulative reward?

I use PPO to train my fermenter (digital twin) to maximize enzyme (product) production. action: 1 or 0 ie. add substrate at a particular time or not based on cell and enzymes present in the tank ...

user79474

21

asked Mar 1, 2024 at 20:32

1vote

0answers

177views

Python libraries for mulit-armed bandit problems [closed]

I am working on a problem that can be casted as a contextual bandit problem with continuous action space. I would like to tackle it by using something like the contextual zooming algorithm from the ...

Onil90

183

asked Feb 22, 2024 at 15:24

2votes

1answer

67views

How does reward work while training a Reinforcement Learning agent?

I am using PPO to train my environment which I created using stable baselines 3. I am confused if I should make the reward = 0 in the step function or not. Initially, I used to have self.reward = 0 in ...

user79474

21

asked Jan 29, 2024 at 23:12

1vote

1answer

158views

Why are these two implementations of the $\epsilon$-greedy policy different?

According to the book Reinforcement Learning An Introduction, the epsilon greedy policy can generally implemented as: $$ \pi(a|s) = \begin{cases} \frac{\epsilon}{|A|} + 1 - \epsilon & \text{if } ...

kklaw

195

asked Nov 30, 2023 at 13:35

1vote

1answer

121views

RL agent for autonomous vehicle is able to follow the road but can't avoid crashing at all (Highway-Env / Racetrack Env.)

I coded some deep RL algorithms (DQN and SAC) with tf2/keras to solve an environment where a vehicle needs to follow the track and avoid crashing into one other vehicle (there is only one other ...

rafiqollective

11

asked Nov 6, 2023 at 21:05

1vote

1answer

478views

Getting always the same action on an A2C from stable_baselines3

I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that ...

Jesuspc

151

asked Aug 1, 2023 at 17:25

1vote

1answer

603views

What is the problem in my implementation of actor critic?

I have been implementing both REINFORCE with baseline and actor-critic to solve "cartpole-v1". As a reminder, here is the presentation of the algorithms in Sutton and Barto's book (http://...

Labo

121

asked Jan 8, 2023 at 22:06

1vote

1answer

443views

OpeanAI Gym. Train problem: invalid values [closed]

I have a problem with my reinforcement learning model. I am trying to simulate an electric battery storage. To keep it as simple as possible, the efficiency of charge, storage and discharge are 100%. ...

MiPre

13

asked Nov 4, 2022 at 12:11

3votes

0answers

152views

Are there Reinforcement Learning algorithms specialized for the case $\gamma=0$?

I have a Reinforcement Learning problem where the optimal policy does not depend on the next state (ie gamma equals 0). I think this means that I only need an efficient exploration algorithm coupled ...

AJSV

31

asked Oct 15, 2022 at 7:33

0votes

1answer

99views

What would the "state space" and its Python implementation be for my simulation?

Context I'm trying to build a social-consensus simulation involving two intelligent agents. The simulation involves a graph/network of nodes. Nearly all of these nodes (> 90%) will be green agents. ...

The Pointer

611

asked Oct 13, 2022 at 9:54

Stack Exchange Network

All Questions

Python - adding more timesteps makes my model "fail"

Reward not improving for a custom environment using PPO

Deep RL problem: Loss decreases but agent doesn't learn

Optimizing Wind Park Layout Using Direct Action-to-Input Mapping

How Do I Optimise a Black-Box Objective Function with DQN Using Reinforcement Learning?

Why is PPO not choosing a solution that is giving a higher cumulative reward?

Python libraries for mulit-armed bandit problems [closed]

How does reward work while training a Reinforcement Learning agent?

Why are these two implementations of the $\epsilon$-greedy policy different?

RL agent for autonomous vehicle is able to follow the road but can't avoid crashing at all (Highway-Env / Racetrack Env.)

Getting always the same action on an A2C from stable_baselines3

What is the problem in my implementation of actor critic?

OpeanAI Gym. Train problem: invalid values [closed]

Are there Reinforcement Learning algorithms specialized for the case $\gamma=0$?

What would the "state space" and its Python implementation be for my simulation?

Hot Network Questions

All Questions

Related Tags